A Bayesian Method for Guessing the Extreme Values in a Data Set

نویسندگان

  • Mingxi Wu
  • Chris Jermaine
چکیده

For a large number of data management problems, it would be very useful to be able to obtain a few samples from a data set, and to use the samples to guess the largest (or smallest) value in the entire data set. Min/max online aggregation, top-k query processing, outlier detection, and distance join are just a few possible applications. This paper details a statistically rigorous, Bayesian approach to attacking this problem. Just as importantly, we demonstrate the utility of our approach by showing how it can be applied to two specific problems that arise in the context of data management.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Ranking Extreme Efficient DMUs Based on Changing the Reference Set with Using L2 - Norm

The purpose of this study is to utilize a new method for ranking extreme efficient decision making units (DMUs) based upon the omission of these efficient DMUs from reference set of inefficient and non-extreme efficient DMUs in data envelopment analysis (DEA) models with constant and variable returns to scale. In this method, an L2- norm is used and it is believed that it doesn't have any e...

متن کامل

Bayesian Sample size Determination for Longitudinal Studies with Continuous Response using Marginal Models

Introduction Longitudinal study designs are common in a lot of scientific researches, especially in medical, social and economic sciences. The reason is that longitudinal studies allow researchers to measure changes of each individual over time and often have higher statistical power than cross-sectional studies. Choosing an appropriate sample size is a crucial step in a successful study. A st...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

A Hybrid Machine Learning Method for Intrusion Detection

Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007